{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Plotting Validation Curves\n\nIn this example the impact of the :class:`~imblearn.over_sampling.SMOTE`'s\n`k_neighbors` parameter is examined. In the plot you can see the validation\nscores of a SMOTE-CART classifier for different values of the\n:class:`~imblearn.over_sampling.SMOTE`'s `k_neighbors` parameter.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Authors: Christos Aridas\n#          Guillaume Lemaitre <g.lemaitre58@gmail.com>\n# License: MIT"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(__doc__)\n\nimport seaborn as sns\n\nsns.set_context(\"poster\")\n\n\nRANDOM_STATE = 42"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let's first generate a dataset with imbalanced class distribution.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from sklearn.datasets import make_classification\n\nX, y = make_classification(\n    n_classes=2,\n    class_sep=2,\n    weights=[0.1, 0.9],\n    n_informative=10,\n    n_redundant=1,\n    flip_y=0,\n    n_features=20,\n    n_clusters_per_class=4,\n    n_samples=5000,\n    random_state=RANDOM_STATE,\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We will use an over-sampler :class:`~imblearn.over_sampling.SMOTE` followed\nby a :class:`~sklearn.tree.DecisionTreeClassifier`. The aim will be to\nsearch which `k_neighbors` parameter is the most adequate with the dataset\nthat we generated.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from imblearn.over_sampling import SMOTE\nfrom imblearn.pipeline import make_pipeline\nfrom sklearn.tree import DecisionTreeClassifier\n\nmodel = make_pipeline(\n    SMOTE(random_state=RANDOM_STATE), DecisionTreeClassifier(random_state=RANDOM_STATE)\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We can use the :class:`~sklearn.model_selection.validation_curve` to inspect\nthe impact of varying the parameter `k_neighbors`. In this case, we need\nto use a score to evaluate the generalization score during the\ncross-validation.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from sklearn.metrics import cohen_kappa_score, make_scorer\nfrom sklearn.model_selection import validation_curve\n\nscorer = make_scorer(cohen_kappa_score)\nparam_range = range(1, 11)\ntrain_scores, test_scores = validation_curve(\n    model,\n    X,\n    y,\n    param_name=\"smote__k_neighbors\",\n    param_range=param_range,\n    cv=3,\n    scoring=scorer,\n)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "train_scores_mean = train_scores.mean(axis=1)\ntrain_scores_std = train_scores.std(axis=1)\ntest_scores_mean = test_scores.mean(axis=1)\ntest_scores_std = test_scores.std(axis=1)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We can now plot the results of the cross-validation for the different\nparameter values that we tried.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import matplotlib.pyplot as plt\n\nfig, ax = plt.subplots(figsize=(7, 5))\nax.plot(param_range, test_scores_mean, label=\"SMOTE\")\nax.fill_between(\n    param_range,\n    test_scores_mean + test_scores_std,\n    test_scores_mean - test_scores_std,\n    alpha=0.2,\n)\nidx_max = test_scores_mean.argmax()\nax.scatter(\n    param_range[idx_max],\n    test_scores_mean[idx_max],\n    label=r\"Cohen Kappa: ${:.2f}\\pm{:.2f}$\".format(\n        test_scores_mean[idx_max], test_scores_std[idx_max]\n    ),\n)\n\nfig.suptitle(\"Validation Curve with SMOTE-CART\")\nax.set_xlabel(\"k_neighbors\")\nax.set_ylabel(\"Cohen's kappa\")\n\n# make nice plotting\nsns.despine(ax=ax, offset=10)\nax.set_xlim([1, 10])\nax.set_ylim([0.4, 0.8])\nax.legend(loc=\"lower right\")\nplt.tight_layout()\nplt.show()"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.4"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}